134 research outputs found
Automated Map Reading: Image Based Localisation in 2-D Maps Using Binary Semantic Descriptors
We describe a novel approach to image based localisation in urban
environments using semantic matching between images and a 2-D map. It contrasts
with the vast majority of existing approaches which use image to image database
matching. We use highly compact binary descriptors to represent semantic
features at locations, significantly increasing scalability compared with
existing methods and having the potential for greater invariance to variable
imaging conditions. The approach is also more akin to human map reading, making
it more suited to human-system interaction. The binary descriptors indicate the
presence or not of semantic features relating to buildings and road junctions
in discrete viewing directions. We use CNN classifiers to detect the features
in images and match descriptor estimates with a database of location tagged
descriptors derived from the 2-D map. In isolation, the descriptors are not
sufficiently discriminative, but when concatenated sequentially along a route,
their combination becomes highly distinctive and allows localisation even when
using non-perfect classifiers. Performance is further improved by taking into
account left or right turns over a route. Experimental results obtained using
Google StreetView and OpenStreetMap data show that the approach has
considerable potential, achieving localisation accuracy of around 85% using
routes corresponding to approximately 200 meters.Comment: 8 pages, submitted to IEEE/RSJ International Conference on
Intelligent Robots and Systems 201
Predicting Out-of-View Feature Points for Model-Based Camera Pose Estimation
In this work we present a novel framework that uses deep learning to predict
object feature points that are out-of-view in the input image. This system was
developed with the application of model-based tracking in mind, particularly in
the case of autonomous inspection robots, where only partial views of the
object are available. Out-of-view prediction is enabled by applying scaling to
the feature point labels during network training. This is combined with a
recurrent neural network architecture designed to provide the final prediction
layers with rich feature information from across the spatial extent of the
input image. To show the versatility of these out-of-view predictions, we
describe how to integrate them in both a particle filter tracker and an
optimisation based tracker. To evaluate our work we compared our framework with
one that predicts only points inside the image. We show that as the amount of
the object in view decreases, being able to predict outside the image bounds
adds robustness to the final pose estimation.Comment: Submitted to IROS 201
The multiresolution Fourier transform : a general purpose tool for image analysis
The extraction of meaningful features from an image forms an important area of image
analysis. It enables the task of understanding visual information to be implemented in a
coherent and well defined manner. However, although many of the traditional approaches to
feature extraction have proved to be successful in specific areas, recent work has suggested
that they do not provide sufficient generality when dealing with complex analysis problems
such as those presented by natural images.
This thesis considers the problem of deriving an image description which could form the basis
of a more general approach to feature extraction. It is argued that an essential property of such
a description is that it should have locality in both the spatial domain and in some
classification space over a range of scales. Using the 2-d Fourier domain as a classification
space, a number of image transforms that might provide the required description are investigated.
These include combined representations such as a 2-d version of the short-time Fourier
transform (STFT), and multiscale or pyramid representations such as the wavelet transform.
However, it is shown that these are limited in their ability to provide sufficient locality in both
domains and as such do not fulfill the requirement for generality.
To overcome this limitation, an alternative approach is proposed in the form of the multiresolution
Fourier transform (MFT). This has a hierarchical structure in which the outermost levels
are the image and its discrete Fourier transform (DFT), whilst the intermediate levels are
combined representations in space and spatial frequency. These levels are defined to be
optimal in terms of locality and their resolution is such that within the transform as a whole
there is a uniform variation in resolution between the spatial domain and the spatial frequency
domain. This ensures that locality is provided in both domains over a range of scales. The
MFT is also invertible and amenable to efficient computation via familiar signal processing
techniques. Examples and experiments illustrating its properties are presented.
The problem of extracting local image features such as lines and edges is then considered. A
multiresolution image model based on these features is defined and it is shown that the MET
provides an effective tool for estimating its parameters.. The model is also suitable for
representing curves and a curve extraction algorithm is described. The results presented for
synthetic and natural images compare favourably with existing methods. Furthermore, when
coupled with the previous work in this area, they demonstrate that the MFT has the potential
to provide a basis for the solution of general image analysis problems
iDF-SLAM: End-to-End RGB-D SLAM with Neural Implicit Mapping and Deep Feature Tracking
We propose a novel end-to-end RGB-D SLAM, iDF-SLAM, which adopts a
feature-based deep neural tracker as the front-end and a NeRF-style neural
implicit mapper as the back-end. The neural implicit mapper is trained
on-the-fly, while though the neural tracker is pretrained on the ScanNet
dataset, it is also finetuned along with the training of the neural implicit
mapper. Under such a design, our iDF-SLAM is capable of learning to use
scene-specific features for camera tracking, thus enabling lifelong learning of
the SLAM system. Both the training for the tracker and the mapper are
self-supervised without introducing ground truth poses. We test the performance
of our iDF-SLAM on the Replica and ScanNet datasets and compare the results to
the two recent NeRF-based neural SLAM systems. The proposed iDF-SLAM
demonstrates state-of-the-art results in terms of scene reconstruction and
competitive performance in camera tracking.Comment: 7 pages, 6 figures, 3 table
Dual-Domain Image Synthesis using Segmentation-Guided GAN
We introduce a segmentation-guided approach to synthesise images that
integrate features from two distinct domains. Images synthesised by our
dual-domain model belong to one domain within the semantic mask, and to another
in the rest of the image - smoothly integrated. We build on the successes of
few-shot StyleGAN and single-shot semantic segmentation to minimise the amount
of training required in utilising two domains. The method combines a few-shot
cross-domain StyleGAN with a latent optimiser to achieve images containing
features of two distinct domains. We use a segmentation-guided perceptual loss,
which compares both pixel-level and activations between domain-specific and
dual-domain synthetic images. Results demonstrate qualitatively and
quantitatively that our model is capable of synthesising dual-domain images on
a variety of objects (faces, horses, cats, cars), domains (natural, caricature,
sketches) and part-based masks (eyes, nose, mouth, hair, car bonnet). The code
is publicly available at:
https://github.com/denabazazian/Dual-Domain-Synthesis.Comment: CVPR2022 Workshops. 14 pages, 19 figure
HDRFusion:HDR SLAM using a low-cost auto-exposure RGB-D sensor
We describe a new method for comparing frame appearance in a frame-to-model
3-D mapping and tracking system using an low dynamic range (LDR) RGB-D camera
which is robust to brightness changes caused by auto exposure. It is based on a
normalised radiance measure which is invariant to exposure changes and not only
robustifies the tracking under changing lighting conditions, but also enables
the following exposure compensation perform accurately to allow online building
of high dynamic range (HDR) maps. The latter facilitates the frame-to-model
tracking to minimise drift as well as better capturing light variation within
the scene. Results from experiments with synthetic and real data demonstrate
that the method provides both improved tracking and maps with far greater
dynamic range of luminosity.Comment: 14 page
- …